Skip to content

Syncing latest changes from upstream master for drbd#3

Open
df-build-team wants to merge 113 commits into
masterfrom
sync_us--master
Open

Syncing latest changes from upstream master for drbd#3
df-build-team wants to merge 113 commits into
masterfrom
sync_us--master

Conversation

@df-build-team

Copy link
Copy Markdown

PR containing the latest commits from upstream master branch

rck and others added 6 commits April 28, 2026 14:20
Signed-off-by: Roland Kammerer <roland.kammerer@linbit.com>
DRBD used a custom mechanism to mark netlink attributes as "mandatory":
bit 14 of nla_type was repurposed as DRBD_GENLA_F_MANDATORY. Attributes
sent from userspace that had this bit present and that were unknown
to the kernel would lead to an error.

Since Linux 5.2 commit ef6243acb478
("genetlink: optionally validate strictly/dumps"), the generic netlink
layer rejects unknown top-level attributes when strict validation is
enabled. DRBD never opted out of strict validation, so unknown
top-level attributes are already rejected by the netlink core.

The mandatory flag mechanism was required for nested attributes, because
these are parsed liberally, silently dropping attributes unknown to the
kernel.

This prepares for the move to a new YNL-based family, which will use the
now-default strict parsing.
The current family is not expected to gain any new (mandatory)
attributes, which makes this change safe.

Old userspace that still sets bit 14 is unaffected: nla_type()
strips it before __nla_validate_parse() performs attribute validation,
so the bit never reaches DRBD.

Remove all references to the mandatory flag in DRBD.

Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Some of these are *ancient*, the oldest ones (debian/*) have not been
touched since 2004!
Some of the rpm-macro-fixes turn 16 this year, and they were only
useful on very old distributions.
The redhat/suse filelists were superseded by a spec macro in 2023.
The kernel-compat tests and patches are unused, they just slipped
through the usual unused checks somehow.

Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Remove verbose boilerplates from leading comments in all files. Only the
SPDX-License-Identifier and a copyright notice stay. The copyright year
is the year when the work was originally created, so we take the file
creation date from git for that.

Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
A user running DRBD 9.3.1 in 8.4-compatibility mode on a 6 TiB volume hit
"ASSERTION bitmap->bm_pages FAILED in __bm_op" after enabling d_bitmap on
a live filesystem.  The volume had been attached with d_bitmap=no (no
peers).  About 86 s into ext4 traffic, "drbdsetup disk-options
--d_bitmap=yes" reached drbd_adm_disk_opts(), which did:

	device->bitmap = drbd_bm_alloc(...);     /* publishes pointer */
	err = drbd_bm_resize(device, ..., true); /* allocates bm_pages */

drbd_bm_alloc() returns a struct with bm_pages == NULL; bm_pages is wired
up only inside the spin-locked block of drbd_bm_resize().  Concurrent IO
in __bm_op() observed device->bitmap != NULL with bitmap->bm_pages ==
NULL and tripped the assertion.

Fix this by keeping the new bitmap in a local until drbd_bm_resize() has
populated bm_pages, then publishing it via smp_store_release().  To make
that possible, thread an explicit struct drbd_bitmap * through
drbd_bm_resize() and the inner helpers it calls (____bm_op, __bm_op,
bm_op, ___bm_op, bm_realloc_pages, bm_count_bits, _drbd_bm_lock /
_drbd_bm_unlock) instead of re-reading device->bitmap.  Public wrappers
keep their device-only signatures and fetch device->bitmap once.

The intent is to make publication of device->bitmap atomic with respect
to readers and to codify the invariant "device->bitmap != NULL implies
bm_pages != NULL", so the reasonable assertion in __bm_op() can stay.

The other two drbd_bm_resize() callers (drbd_determine_dev_size() and
drbd_adm_attach()) operate on an already-published bitmap and just pass
device->bitmap through the new parameter; behaviour there is unchanged.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
drbd_adm_attach() has the same shape as the disk-options bug fixed in
the previous commit: drbd_bm_alloc() returns a struct with bm_pages ==
NULL, and the pointer is published into device->bitmap immediately,
long before drbd_bm_resize() (called either at the on-disk-bitmap-read
step or, when there is no prior history, inside drbd_determine_dev_size())
wires up bm_pages.

In practice this window is quiescent in the attach path -- the device
is still D_ATTACHING, no peer is connected, and no IO is reaching the
bitmap -- so the assertion in __bm_op() has not been observed there.
But the pattern is fragile; defend against future regressions by
applying the same fix shape.

Fix this by keeping the new bitmap in a local until drbd_bm_resize() has
populated bm_pages. Mirroring drbd_adm_disk_opts().

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
@df-build-team df-build-team force-pushed the sync_us--master branch 7 times, most recently from 1463bf5 to 1653ca6 Compare May 6, 2026 08:03
Philipp-Reisner and others added 5 commits May 6, 2026 11:18
Oops, turns out this was still referenced in our build system, so SUSE
builds are broken now.

Partial revert of d107002 (drbd: housekeeping, remove some unused files).

Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
The approach using `override EXTRA_CFLAGS +=` does not work on some
modern kernels. For instance 7.0.0-14-generic from Ubuntu Resolute /
26.04.

This is a partial backport of commit 4d2b2ac ("drbd: add fault
injection and debugfs for bio chaining").

Signed-off-by: Joel Colledge <joel.colledge@linbit.com>
Today an arbitrary number of volumes can resync simultaneously, slicing
disk and network bandwidth thinly across all of them. On a host with
many volumes sharing the same physical disks or network link, total
time-to-in-sync suffers.

Introduce a runtime-changeable module parameter max_parallel_resyncs
that caps how many volumes may have an unpaused resync (or verify) at
any given time. The default of 0 preserves existing behaviour
(unlimited). Counting is per-volume (struct drbd_device): multiple
peer_devices of the same volume that resync in parallel still occupy
a single slot. L_SYNC_S/T and L_VERIFY_S/T count toward the cap; their
L_PAUSED_* counterparts do not.

When admitting paused volumes, prefer volumes whose resource already
has a running resync. That way we finish all volumes of one resource
before starting volumes of the next, instead of spreading the resync
of every resource thinly across the cluster.

Reuse the existing resync-pause mechanism by adding a new pause-reason
flag resync_susp_max_parallel[] alongside the existing user / peer /
dependency / other_c flags. The flag plumbs through the state-change
snapshot, the resync_suspended() aggregator that maps L_SYNC_* to
L_PAUSED_SYNC_*, the change-detection notifier path, and the per-
peer_device suspension-reason printer (shows "max_parallel,").

Re-evaluation runs in three places: on every state change that
finishes a resync (right next to the existing resume_next_sg() call),
on every write to /sys/module/drbd/parameters/max_parallel_resyncs,
and inline in drbd_start_resync() so the very first transition lands
in L_PAUSED_SYNC_* instead of flicking through L_SYNC_* first.

The flag is a local admission-control decision; it is not propagated
to peers, so the wire protocol is unchanged. Each node makes its own
admission decision based on its own max_parallel_resyncs.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
This lets drbdsetup status display the parallel-resync suspension
explicitly, alongside the other resync_susp_* reasons.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
@df-build-team df-build-team force-pushed the sync_us--master branch 2 times, most recently from 8927d96 to 97e5365 Compare May 8, 2026 08:03
The compat patches are concatenated into a single .compat.cocci before
spatch processes them, so metavariable names declared in one rule leak
into the parser state for all subsequent rules. This produced a flood
of "previously declared as a metavariable, is used as an identifier"
and "should X be a metavariable?" warnings whenever a later rule used
a name (bdev, device, info, skb, order) that an earlier rule had bound.

Declare these names with 'symbol' (or, for the handle->bdev struct
field accesses where 'symbol' does not silence the warning, a
regex-constrained 'identifier bdev =~ "^bdev$"') so spatch knows they
are literal C identifiers in this rule. Also drop the unused 'e' and
'lim' metavariables from queue_limits_features and add the missing
'identifier device' to the first rule of queue_flag_discard.

No functional change to the generated patches.

Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
@df-build-team df-build-team force-pushed the sync_us--master branch 3 times, most recently from f2f7062 to d70f033 Compare May 11, 2026 08:11
Upstream Linux v6.12 commit aaed5c7739be ("kbuild: slim down package
for building external modules") removed the explicit copy of .config
into the linux-headers Debian package built by make bindeb-pkg. With
that change, $(objtree)/.config is absent in installed headers for any
kernel produced via stock bindeb-pkg, and the date -r / touch -r calls
that drive the compat-h freshness check abort the build.

include/config/auto.conf is regenerated from .config on every kconfig
change, is part of the public kbuild interface, and is shipped by every
kernel-headers package (Ubuntu, Debian, RHEL kernel-devel, and stock
bindeb-pkg all include it). Use it as the freshness reference instead.

Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
JoelColledge and others added 4 commits May 12, 2026 14:06
Consider the following configuration:
* DRBD Nodes A, B, C
* Dedicated DRBD proxy nodes P, Q
* Direct connection A-B
* Proxy connections A-P-Q-C and B-P-Q-C

Now there are 2 connections "Q-C" from the proxy inside to DRBD. Prior
to this change, if the same port is used on C for each of these
connections, then DRBD on C cannot distinguish the incoming TCP
connections. More generally, when two paths share the same listener and
the same peer IP, the old IP-only lookup always returned the first
matching path. Incoming sockets for the second path were then silently
misrouted to the first, causing "Peer presented a node_id of X instead
of Y" errors and wait-connect timeouts.

When searching for the path for an incoming connection, prefer an exact
(IP, port) match. A corresponding change to DRBD proxy causes it to bind
outgoing inside connections to the configured port. The result is that
DRBD can reliably distinguish the incoming connections. Direct
DRBD-to-DRBD peers use an ephemeral source port so they continue to
match via the IP-only fallback, preserving existing behaviour for all
configurations.

Signed-off-by: Joel Colledge <joel.colledge@linbit.com>
…ags-y'

Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
…nfig/auto.conf, not .config'

Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
…rbd_find_path_by_addr()'

Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Bit 8 (val 256) is already used for DRBD_FF_BM_BLOCK_SHIFT on master.
This incident is a pure reminder that forking the drbd-headers is a
bad idea.
Philipp-Reisner and others added 15 commits June 18, 2026 10:55
A received "unallocated" (day0) bitmap UUID is a resync hint, not a shared
ancestor.

  before: receive_uuids110 -> update_bitmap_slot_of_peer -> push to history
          (persists; survives an "unrelated data" abort)
          -> retry: RULE_HISTORY_BOTH match -> split brain

  after:  unalloc slot -> peer_device->bitmap_uuids[] only (drives the
          handshake); never pushed to history

The handshake decision uses bitmap_uuids[] / our own peers[] slots, not
history, so day0-based resync is unaffected. Allocated / MDF_NODE_EXISTS
slots are still recorded, preserving the existing avoidance of false split
brains (recording a peer's bitmap slots in our own history).

Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
…esent

day0 UUID = previous current UUID at the first bump.

  _drbd_uuid_push_history():    history empty -> record day0 (even if it
                                is also still held in a bitmap slot)
  __new_current_uuid_prepare(): first bump (day0) -> push day0 to history
  _drbd_send_uuids110():        originate UUID_FLAG_HAS_UNALLOC only when
                                drbd_unallocated_index() != -1

max-peers > peers: a real unallocated day0 bitmap exists; advertise it and
the peer resyncs from it (bitmap_mod_after_handshake() copies the slot).
Otherwise the day0 UUID now in history resolves the handshake via
RULE_HISTORY_* -> SET_BITMAP (full sync). With the receiver change a day0
hint is never recorded as an ancestor.

Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
…rors

When DRBD is configured with TLS, a TLS handshake that fails with a
transient socket error (e.g. -ETIMEDOUT) currently propagates out of
dtt_connect() unchanged. The connect-cycle caller treats anything other
than -EAGAIN as fatal, so the connection moves to StandAlone instead of
retrying.

Fix this by translating the same set of transient socket errors that
dtt_try_connect() already maps to -EAGAIN at each of the three TLS
handshake error sites (tls_init_hello for both sockets and
tls_wait_hello). A retriable error now restarts the connect cycle; other
errors continue to surface as fatal.

Signed-off-by: Joel Colledge <joel.colledge@linbit.com>
@df-build-team df-build-team force-pushed the sync_us--master branch 7 times, most recently from 52e7c43 to 99e64ca Compare June 25, 2026 08:11
@df-build-team df-build-team force-pushed the sync_us--master branch 4 times, most recently from f05ad1f to d9c071c Compare June 29, 2026 08:07
…-master

Signed-off-by: DF Build Team <df-build-team@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants